Skip to content

feat(results): append-only run index for fast Studio list view#1260

Closed
christso wants to merge 1 commit into
mainfrom
p1-runs-index
Closed

feat(results): append-only run index for fast Studio list view#1260
christso wants to merge 1 commit into
mainfrom
p1-runs-index

Conversation

@christso
Copy link
Copy Markdown
Collaborator

Summary

Implements P1 of #1259 — adds an append-only index/runs.jsonl to the results repo so Studio's /api/runs reads one file instead of readdir+statSync+loadResultFile per run.

Index format

Each line in index/runs.jsonl is a JSON object:

{"run_id":"myexp::2026-05-21T10-00-00-000Z","timestamp":"2026-05-21T10:00:01.000Z","experiment":"myexp","target":"gpt-4o","test_count":5,"passed":4,"pass_rate":0.8,"avg_score":0.85,"size_bytes":12345,"tags":[],"sha":"abc123def"}

sha is the git commit SHA of the results-repo commit that added the run. It is backfilled via git commit --amend --no-edit immediately after the initial commit (before the push).

How reads change

listMergedResultFiles (called by /api/runs every 5s) now:

  1. Checks for index/runs.jsonl in the cached results repo → if present, reads it as a single file (O(1))
  2. Falls back to listResultFilesFromRunsDir (directory walk) if the index is missing — old repos keep working

How writes change

directPushResults (called on auto_push) now accepts an optional indexEntry. When provided:

  1. Appends the entry (without sha) to index/runs.jsonl
  2. Commits run artifacts + index together
  3. Gets git rev-parse HEAD, patches the last index line with the sha
  4. Amends the commit (--no-edit)
  5. Pushes with the existing retry logic

Migration command

agentv results reindex [--dir <cwd>] [--dry-run]

Walks the existing run tree, rebuilds index/runs.jsonl from scratch, commits and pushes. Run once after upgrading; new pushes maintain the index automatically.

Test plan

  • packages/core/test/evaluation/run-index.test.ts — 8 tests: append creates dirs, writes valid JSONL, appends without overwriting, sha round-trip, read handles missing/malformed files
  • apps/cli/test/commands/results/remote.test.ts — 5 tests: encode/decode/identify remote run IDs, index parse correctness, fallback when index absent
  • bun --filter @agentv/core test — 1782 pass, 0 fail
  • bun --filter agentv test — 553 pass, 0 fail
  • bun --filter @agentv/core typecheck — clean
  • bun --filter agentv typecheck — clean
  • bun run lint — clean
  • Pre-push hook (build + typecheck + lint + test + validate) — all passed

Closes part of #1259 (P1 sub-task).

🤖 Generated with Claude Code

…P1)

Adds index/runs.jsonl to the results repo so Studio's /api/runs reads ONE
file instead of readdir+statSync+loadResultFile per run (O(N) → O(1)).

Changes:
- RunIndexEntry interface (snake_case wire format): run_id, timestamp,
  experiment, target, test_count, passed, pass_rate, avg_score, size_bytes,
  tags, sha
- appendToRunIndex / readRunIndex helpers in packages/core
- directPushResults: writes index entry on each push; backfills commit sha
  via --amend after the initial commit
- reindexResultsRepo: rebuilds index from scratch for migration
- listMergedResultFiles: reads index/runs.jsonl first; falls back to
  directory walk for older repos without an index
- agentv results reindex: CLI command to backfill existing repos (--dry-run)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying agentv with  Cloudflare Pages  Cloudflare Pages

Latest commit: f2d917a
Status: ✅  Deploy successful!
Preview URL: https://f1b2ce20.agentv.pages.dev
Branch Preview URL: https://p1-runs-index.agentv.pages.dev

View logs

@christso
Copy link
Copy Markdown
Collaborator Author

Closing — the index-file approach was solving the right problem the wrong way.

Design pivot

After deeper review (and comparing with entireio + skillfully patterns), the cleaner architecture is:

  • Git is the canonical store, not a transport layer
  • No global index/runs.jsonl — the git tree IS the index (git ls-tree lists runs, git cat-file --batch reads existing benchmark.json blobs)
  • Eval writes directly to the local clone working tree, not to project-local .agentv/results/runs/
  • Reads use the git object DB (git ls-tree + git cat-file), no git checkout needed
  • Pagination via cursor over sorted run list

This removes:

  • The append-only index file (drift, grows forever, sha-amend dance)
  • The agentv results reindex migration command (nothing to backfill — benchmark.json already exists per run)
  • The "local vs remote" merge in listMergedResultFiles (one source: the configured remote repo)
  • git checkout + git pull from sync (just git fetch)

A fresh PR will land the new architecture in one shot. Tracking issue #1259 will get a follow-up comment explaining how the P1-P6 breakdown collapses given the new design.

@christso christso closed this May 21, 2026
christso added a commit that referenced this pull request May 23, 2026
* docs: design plan for git-native results storage (#1259)

Captures the agreed architecture before implementation:
- Git is the canonical store; local clone is the working copy
- No separate index file — git tree IS the index
- Eval writes directly to clone working tree (not project-local .agentv/results/)
- Reads via git ls-tree + git cat-file --batch (no checkout)
- Pagination via cursor
- mode: github explicit in config (extension point)

Supersedes closed PR #1260. See docs/plans/git-native-results.md for full design.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(results): Pass 1 — config schema + path renames

- Add `mode: 'github'` as required field to ResultsConfig
- Repurpose `results.path` as optional local filesystem path for clone
  (default: ~/.agentv/results/<slug>/); reject old-style subdir values
  (e.g. 'runs') with a migration message
- Rename ResultsRepoCachePaths → ResultsRepoLocalPaths
- Rename getResultsRepoCachePaths → getResultsRepoLocalPaths
- Rename cache_dir → local_dir in ResultsRepoStatus wire format
- normalizeResultsConfig: fill default path, expand ~, include mode
- Remove redundant local normalizeResultsConfig copy in remote.ts
- Update config-validator.ts to enforce mode and filesystem-path rule
- Update tests for new schema

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(results): fix lint + update resolveResultsRepoRunsDir + serve tests

- Fix biome string-concat lint error (single template literal)
- resolveResultsRepoRunsDir: use normalized.path directly (new design)
- getResultsRepoStatus: check existsSync(normalized.path) for available,
  set local_dir to normalized.path
- serve.test.ts: update two tests to use mode:github schema and new
  default path layout (~/.agentv/results/<slug>/runs/...)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* wip: initial git-native listing skeleton + implementation goal

- Added listGitRuns() using git ls-tree + cat-file --batch
- Improved batch parser
- Saved implementation goal document

This is early progress toward the full git-native results implementation.
More to come in follow-up commits.

* fix: remove duplicate execFileAsync declaration

* feat(results): improve git-native listing metadata shape

- Enrich GitListedRun with display_name, test_count, avg_score, size_bytes
- Update remote.ts mapping to populate ResultFileMeta fields
- Read path now returns data Studio can render

* chore: update implementation goal + docker ownership fix

- Add user: ${UID}:${GID} to docker-compose for mounted repo permissions
- Update goal document with current status
- Reinstall dependencies in worktree

* fix(results): restore git-native run listing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(results): satisfy lint

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(test): stabilize git subprocess checks

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* chore(test): satisfy lint and timeouts

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* feat(results): finish git-native results flow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(results): complete remote-only studio flow

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* seed repo

* fix(test): isolate git env in serve regression

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(test): restore readme after temp repo setup

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(test): trim low-value flaky coverage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(results): materialize synced remote runs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(results): atomically materialize synced runs

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(studio): clarify remote results behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(cli): treat AGENTV_HOME log as info

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* docs(studio): refresh remote results screenshots

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Test User <test@example.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant